optimal value function meaning in Chinese
最优值函数
Examples
- The equation of state and optimal value function used to achieve the optimal strategy is figured out through the analysis of conditional probability of the process
通过条件概率分析,计算出了动态规划状态转移方程和最优期望代价方程,并得到了关联规则发现的决策策略。 - Reinforcement learning algorithms that use cerebellar model articulation controller ( cmac ) are studied to estimate the optimal value function of markov decision processes ( mdps ) with continuous states and discrete actions . the state discretization for mdps using sarsa - learning algorithms based on cmac networks and direct gradient rules is analyzed . two new coding methods for cmac neural networks are proposed so that the learning efficiency of cmac - based direct gradient learning algorithms can be improved
在求解离散行为空间markov决策过程( mdp )最优策略的增强学习算法研究方面,研究了小脑模型关节控制器( cmac )在mdp行为值函数逼近中的应用,分析了基于cmac的直接梯度算法对mdp状态空间离散化的特点,研究了两种改进的cmac编码结构,即:非邻接重叠编码和变尺度编码,以提高直接梯度学习算法的收敛速度和泛化性能。